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Abstract 

In the analysis of clustered and longitudinal data, which includes a covariate that 
varies both between and within clusters (e.g. time-varying covariate in longitudinal 
data), a Hausman pretest is commonly used to decide whether subsequent inference 
is made using the linear random intercept model or the fixed effects model. We 
assess the effect of this pretest on the coverage probability and expected length of 
a confidence interval for the slope parameter. Our results show that for the small 
levels of significance of the Hausman pretest commonly used in applications, the 
minimum coverage probability of this confidence interval can be far below nominal. 
Furthermore, the expected length of this confidence interval is, on average, larger 
than the expected length of a confidence interval for the slope parameter based on 
the fixed effects model with the same minimum coverage. 
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1 Introduction 


The linear random intercept model is commonly used in the analysis of clustered and 
longitudinal data. In clustered data the response variable is measured once on a unit 
where each unit is nested within a particular cluster of units. For example, analyzing the 
reading test score of children which are nested within classrooms (clusters). Longitudinal 
data, which can also be viewed as clustered data (see Rabe-Hesketh and Skrondal (2012, p. 
227)), is where the response variable is measured at different time points for each subject 
and where the measurements across time are nested within each subject (cluster). For 
example, analyzing the weights of individuals over time where the weight measurements 
across time are nested within individuals (clusters). When including a covariate in the 
linear random intercept model that varies both between and within clusters (e.g. time- 
varying covariate in longitudinal data) a preliminary Hausman (1978) test is commonly 
used to test the assumption of no correlation between the random intercept and covariate. 
If the Hausman pretest rejects the null hypothesis of no correlation between the random 
intercept and covariate then the fixed effects model is chosen for subsequent inference, 
otherwise the linear random intercept model is chosen. This preliminary model selection 
procedure has been widely used in econometrics (see e.g. Wooldridge, 2002 and Baltagi, 
2005) and has also been adopted in other areas such as medical statistics, see e.g. Gardiner 
et al. (2009) and Mann et al. (2004). The Hausman pretest has also been implemented in 
popular statistical computer programmes including SAS, Stata, eViews and R, see Ajmani 
(2009, Chapter 7.5.3), Rabe-Hesketh and Skrondal (2012, Chapter 3.7.6), Griffiths et al. 
(2012, Chapter 10.4) and Croissant and Millo (2008), respectively. 

The two-stage procedure widely used in the analysis of clustered and longitudinal data 
is as follows. In the first stage, the Hausman pretest is used to decide whether subsequent 
inference is made using the random intercept model or the fixed effects model (see e.g. 
Ebbes et al., 2004 and Jackowicz et al., 2013). The second stage is that the inference of 
interest is carried out assuming that the model chosen in the first stage had been given 
to us a priori, as the true model. Guggenberger (2010) considers this two-stage procedure 
when the inference of interest is a hypothesis test about the slope parameter. He provides 
both a local asymptotic analysis of the size of this test and a finite sample analysis (via 
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simulations) of the probability of Type I error. 

We consider the case that the inference of interest is a confidence interval for the slope 
parameter. Kabaila et ah, 2015 state 3 new finite sample results (Theorems 01 and [7] of 
the present paper) on the effect of a Hausman pretest on the coverage probability of the 
confidence interval for this parameter. They also provide outline proofs of these results and 
a brief initial analysis of the coverage properties of this confidence interval. These coverage 
properties are estimated by simulation. In the present paper, we describe how the third 
of these results can be used to substantially improve the efficiency of these simulations 
through the use of variance reduction by control variates. 

We compare the expected length of the confidence interval resulting from the two- 
stage procedure with the expected length of the confidence interval based on the fixed 
effects model, where the latter interval is adjusted to have the same minimum coverage as 
the former interval. The quantity used for this comparison is the scaled expected length, 
defined as the expected length of the interval resulting from the two-stage procedure divided 
by the expected length of the adjusted interval from the fixed effects model. The scaled 
expected length provides an insight into how useful the Hausman pretest is in this context. 
In the present paper, we provide 4 new finite sample theorems (Theorems in 0 m and 0 
on the scaled expected length. The scaled expected length is estimated by simulation. We 
describe how to substantially improve the efficiency of these simulations through the use 
of variance reduction by control variates. 

The results that we present make it easy to assess, for a wide variety of circumstances, 
the effect of the Hausman pretest on the coverage properties and scaled expected length 
of the confidence interval for the slope parameter. Our results show that when the usual 
small nominal level of significance for the Hausman pretest is used, the two-stage procedure 
can result in a confidence interval with minimum coverage probability far below nominal. 
However, if the nominal level of significance is increased to 50% then the minimum coverage 
probability is much closer to the nominal coverage (see Figures 1, 2 and 3). We also show 
that, in terms of expected length, the confidence interval resulting from the two-stage 
procedure is consistently outperformed by the confidence interval based on the fixed effects 
model, regardless of the nominal level of significance chosen for the Hausman pretest (see 
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Tabic 1). The results presented in this paper were computed using programs written in the 
R programming language, which will be made available in a convenient R package. 

In Section 2, we consider the practical situation that the random error and random effect 
variances are estimated from the data. We consider three estimators of these variances: 
the usual unbiased estimators, the maximum likelihood estimators of Hsiao (1986) and the 
estimators of Wooldridge (2002). The coverage probability and the scaled expected length 
of the confidence interval resulting from the two-stage procedure are determined by 4 known 
quantities and 5 unknown parameters. The known quantities are the number of individuals, 
the number of time points (for the longitudinal data case), the nominal significance level 
of the Hausman pretest and the nominal coverage probability of this confidence interval. 
The unknown parameters are the random error variance, the random effect variance, the 
variance of the covariate (or time-varying covariate in the case of longitudinal data), a scalar 
parameter that determines the correlation matrix of the covariates and a non-exogeneity 
parameter. 

If, for given values of the 4 known quantities, we wish to assess the dependence of the 
coverage probability and the scaled expected length of the confidence interval resulting 
from the two-stage procedure on the 5 unknown parameters then we might consider, say, 
five values for each of these unknown parameters, leading to 3125 parameter combinations. 
Apart from the daunting task of summarizing so many results, it is possible that one might 
miss important values of the unknown parameters, such as values for which the coverage 
probability is particularly low or the scaled expected length is particularly large. 

Theorems [T| and [5] state that, apart from the known quantities, the coverage probability 
and scaled expected length are actually determined by only 3 unknown parameters, includ¬ 
ing the non-exogeneity parameter. If we compute the minimum coverage probability with 
respect to the non-exogeneity parameter then we have only 2 unknown parameters and 
our assessment of the coverage properties of the confidence interval resulting from the two- 
stage procedure is greatly simplified. Theorems [2] and [6] state that the coverage probability 
and scaled expected length are even functions of the non-exogeneity parameter, so that 
computation time is halved. We also propose a scaling of the non-exogeneity parameter 
that takes account of the sample size. In effect, this scaling reduces the number of known 
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quantities that determine this coverage probability from 4 to 3. 

In Section 3, we consider the coverage probability of the confidence interval resulting 
from the two-stage procedure when the random error and random effect variances are 
assumed to be known. Theorem [7] shows that the conditional coverage probability of 
the confidence interval resulting from the two-stage procedure can be found exactly by 
the evaluation of the bivariate normal cumulative distribution function. This theorem is 
important because it is used to reduce the variance of the simulation based estimators of the 
coverage probability and scaled expected length of the confidence interval resulting from 
the two-stage procedure (when random error and random effect variances are estimated). 
As we show in Section 4 this variance reduction is achieved by using control variates. 

2 The model and the practical two-stage procedure 
(random error and random effect variances are 
estimated) 

We focus on the case of longitudinal data for which i denotes the individual (i — 1,..., N) 
and t denotes the time period (t = 1,..., T). By interpreting i as the cluster index and 
t as the unit of analysis, our results also apply to the analysis of clustered data. Part 
of the following description of the model and two-stage procedure is taken from Kabaila 
et al., 2015. Let yu and Xu denote the response variable and the time-varying covariate, 
respectively, for the Pth individual at time t . Suppose that 


Vit — « + fan + y-i + Sit, (1) 

where the e it ’s and the (fJ>i,Xn ,..., x^)’s are independent, the e it ’s are i.i.d. iV(0, of) and 
the /ids are i.i.d. N( 0, of). We call /3 the slope parameter, of the error variance and of the 
random effect variance. The e it ’s and the /ids are unobserved. Suppose that the parameter 
of interest is the slope parameter f3 and that the inference of interest is a confidence interval 
for f3. 

Also suppose that the (//*, xa ,..., Xit)’s are i.i.d. multivariate normally distributed 
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with zero mean and covariance matrix 


or 


'eerier . r e 


Tcr^cr x e 


olG 


( 2 ) 


where e is a T vector of l’s, G is a T x T matrix with l’s on the diagonal and p on the 
off-diagonal (compound symmetry), and f is a parameter that measures the dependence 
between pi and (xn,... ,Xix). We define the “non-exogeneity parameter” r = r(T/(l + 
(T — l)p)) and note that it is a correlation, so r G (—1,1). If r = 0 then the x^s are 
exogenous variables. Let x = (in,..., x\ t, X 21 , ■ ■ ■, x 2 t, ■ ■ ■, %ni, ■ ■ ■, %nt )• In simulations, 
we generate x as follows. Suppose that %,..., u\t, ■ ■ ■ ,um, ■ ■ ■, unt, Mi, ..., % are i.i.d. 
iV(0,1). Also suppose that these random variables and the e it ’s are independent. Let 
x it /cr x = (1 — p) l t 2 u it + p 1 ^ 2 Vi for i = 1,..., N and t = 1,... ,T. The distribution of 
Pi conditional on (xn ,..., x ^), found from (J2|) , is used to generate (pi ,..., p^). Let 

u = (till, • • • , Mir, • • • , Mjvi, . . . , Mjvr, Ml, ... , Vn). 

Assume, for the moment, that a £ and cr M are known. When r = 0, a confidence interval 
for j3 may be found as follows. Let ip = er^/oe. Condition on x and use the GLS estimator 
{3 (ip) of j3. Let z c = $ _1 (c), where <f> denotes the Af(0,1) cdf. Define the following conhdence 
interval for f3 

J(0) = j3(ip) - Zi- a / 2 (V&r 0 (j3(ip) |x)) 1/2 , j3(ip) + Zi_ Q/2 (Var o (/3(0) |r)) i/2 , 

where Var o (/3( , 0) | x) denotes the variance of f3(ip), conditional on x when r = 0. The 
conhdence interval I (ip) has coverage probability 1 — a when r = 0. Averaging (|TJ) over 
t = 1,..., T for each i = 1,..., N we obtain 


Vi = a + /3xi + pi + Si, 


(3) 


where 


Vi = Vit ’ Xi = f Xit and 
t =1 t =1 


rj~i 




t =1 


This model is called the between effects model. When r = 0, an alternative estimator of [3 
is 13b, the OLS estimator based on the model ([3]), when we condition on x. This estimator 
does not require a knowledge of ip. Subtracting (j3]) from ([!]) , we obtain 


Hit ~ Hi = /3{x it - Xi) + (£ it - Si). 


(4) 
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This model is called the fixed effects model. We estimate (3 by Ar> the OLS estimator 
based on this model. Define the following confidence interval for f3 


J(a £ ) = (3 W ~ zi-a /2 (Var(Av I A) 1/2 , An + D-a /2 (Var (fi w I x)) 


i 1 / 2 


where Var(Av \ x) denotes the variance of / 3w, conditional on x. Irrespective of the value of 
r, the confidence interval J(cr £ ) has coverage probability 1 — a. In practice, we do not know 
whether or not r = 0. The usual procedure is to use a Hausman pretest to test H 0 : t = 0 
against ff„ : r ^ 0. We consider this pretest, based on the test statistic 

(Pw — Pb) 2 


H(a £ , <7 m ) = 


Var(Av | x) + Var 0 (p B \ x) 


(5) 


where Var o(Pb \ x) denotes the variance of f3s conditional on x and assuming that r = 0. 
This test statistic has a x\ distribution under H 0 . Suppose that we accept H 0 if H(a e , cr /t ) < 
z i-a/ 2 i otherwise we reject H 0 . Note that 5 is the level of significance of this test. We now 
describe the two-stage procedure. If H 0 is accepted then we use the confidence interval 
/(A); otherwise we use the confidence interval J(a £ ). Let K(a £1 ay) denote the confidence 
interval, with nominal coverage 1 — a, that results from this two-stage procedure. 

Of course, in practice, a £ and a M are not known and need to be estimated. So, in 
practice, the Hausman pretest is based on the test statistic H(a £ , ay ) and the two-stage 
procedure results in the confidence interval K (cb, <5 /t ) where d £ and a^ denote estimators 
of a £ and <r M , respectively. 


2.1 The coverage probability of the confidence interval resulting 
from the two-stage procedure 

The coverage probability of the confidence interval constructed from this two-stage proce¬ 
dure is P(/3 G K(a £ , d f jj). The following two theorems (stated by Kabaila et al. 2015) give 
properties of this coverage probability. 

Theorem 1 For (d £ ,d M ) any of the pairs of estimators listed in Appendix B, the uncon¬ 
ditional coverage probability P(f3 G K(a £ , a M )) is determined by N (the number of indi¬ 
viduals), T (the number of time points), d (the nominal significance level of the Hausman 
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pretest), 1 — a (the nominal coverage probability), ip (the ratio cr fl /cr £ ), p (the parameter 
that determines G) and r (the non-exogeneity parameter). Given these quantities, the cov¬ 
erage probability does not depend on either a 2 (the variance of the random error) or a 2 
(the variance of the random effect) or a 2 (the variance of the time-varying covariate x it ). 

Theorem 2 Suppose that N, T, a, 1 — a, ip and p are fixed. When a £ and a^ are replaced 
by any of the pairs of estimators listed in Appendix B, the unconditional coverage probability 
P(/3 G K(a £ ,a fl )) is an even function of r G (—1,1). 

Outline proofs of these results are provided by Kabaila et ah, 2015. Detailed proofs 
are provided in the supplementary material for the present paper. A remarkable feature 
of the proofs of these theorems is that they are carried out without relying on a simple 
expression for the coverage probability P{(3 G K(a £ ,a l f)). We use simulations to compute 
P(/3 G K(a e , <r M )), employing variance reduction by control variates, as described in Section 
4. Using Theorem |2j we only need to consider r in the interval [0,1), which means that 
we have reduced the number of simulations needed to estimate the coverage probability 
function (or its minimum) by half. 

We now examine the influence that the nominal level of significance a of the ffausman 
pretest has on the coverage probability function P(/3 G K(a £l o^)). Suppose that d £ and 

are the usual unbiased estimators of a £ and a respectively (described in Appendix 
B). Consider p = 0.3, N = 100, T = 3, ip = o^/a £ = 1/3 and the nominal coverage 
probability 1 — a = 0.95. In practice, it is common to use a small value of 5, such as 0.05 
or 0.01. As noted by Guggenberger (2010), examples of practical applications that have 
used a small a for the Hausman pretest are provided by Gaynor et al. (2005, p.245) and 
Bedard and Deschenes (2006, p.189). Figure 1 presents graphs of the coverage probability 
P(/3 G K(a e ,d^)), considered as a function of A = N l ^ 2 r. Each graph is computed using 
the variance reduction method and the common random numbers (to produce smoother 
graphs) described in Section 4. The number of simulation runs used to compute each graph 
is M = 20000. The bottom (with circle points) graph is for nominal significance level 
a = 0.05 of the ffausman pretest. This graph falls well below the nominal coverage for a 
wide interval of values of A, with the minimum of the coverage probability approximately 
equal to 0.75. Suppose that we choose the significance level of the ffausman pretest to 


be quite large, say a = 0.50. Now the Hausman pretest is more likely to reject the null 
hypothesis that r = 0 and therefore more likely to choose the fixed effects model for the 
construction of the confidence interval. The top (with triangle points) graph is for nominal 
significance level a = 0.50 of the Hausman pretest. Although this graph is still below the 
nominal coverage, there has been a large improvement. 



0 2 4 6 8 10 

A 


Figure 1: Graphs of the coverage probability functions of the confidence interval resulting 
from the two-stage procedure, for nominal coverage probability 0.95. Here A = iV 1//2 r, 
where r is the non-exogeneity parameter, p = 0.3, N = 100, T = 3 and 0 = 1/3. Two 
nominal levels of significance, a = 0.05 and a = 0.5, of the Hausman pretest are considered. 

As noted in the Introduction and in Section 2, if we compute the minimum over r of the 
coverage probability P(/3 G K(a e , a M )) then we are left with only two unknown parameters, 
0 and p. If we fix 0 then the minimum coverage depends only on p, where p € (—1,1), 
as it is a correlation. Suppose that d e and <50 are the usual unbiased estimators of a £ 
and cr^, respectively. Suppose that N = 100, T = 3, 0 = a^/cJe = 1/3 and the nominal 
coverage probability 1 — a = 0.95. Figure 2 presents graphs of the coverage probability 
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P((3 G K(a E , o/J), minimized over r, considered as a function of p (p > 0). It can be 
seen that the coverage probability, minimized over r, is a decreasing function of p. This 
is because as p increases, Xu — X{ becomes closer to 0 for t = 1,... ,T (i = 1,... ,N), 
causing (3w to become a very inaccurate estimator of f3. This then causes the test statistic 
//(ay, ay) to have very little power to detect non-zero values of r. Each estimate of the 
minimum coverage is found using the common random numbers and the variance reduction 
method described in Section 4. Similarly to Figure 1, we see a vast improvement in the 
minimum coverage by letting a = 0.50 rather than choosing a to be the commonly used, 
smaller value 0.05. 



0.0 0.2 0.4 0.6 0.8 

P 


Figure 2: Graphs of the coverage probability functions, minimized over the non-exogeneity 
parameter r, of the confidence interval resulting from the two-stage procedure for nominal 
coverage probability 0.95. This minimum coverage is considered as a function of p, the 
parameter that determines G. Two nominal levels of significance, a = 0.05 and a = 0.5, 
of the Hausman pretest are considered. Here N = 100, T = 3 and ^ = 1/3. 
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In practice, ip is not known and must be estimated from the data. However, it is 
commonly observed in practice that p takes small to moderate values, and not values 
close to 1. This suggests that we fix p and plot the graph of the coverage probability 
P(j3 G K(a £1 ay,)), minimized over r, as a function of %Suppose that ay and ay, are the 
usual unbiased estimators of oy and ay,, respectively. Consider N = 100, T = 3, the nominal 
coverage probability 1 — a = 0.95 and p = 0.4. Figure 3 presents graphs of the coverage 
probability P(f3 G K(a £ , ay,)), minimized over r, as a function of ip. For nominal significance 
level a = 0.05 of the Hausman pretest, this minimized coverage probability is far below 
the nominal coverage for ip approximately equal to 0.2. However, for nominal significance 
level a = 0.5 of the Hausman pretest, we see (once more) a dramatic improvement in the 
minimum coverage probability. 

2.2 Comparison of the two-stage interval with the adjusted 
interval based on the fixed effects model 

The following notation is introduced to describe the expected length of the two-stage 
confidence interval. Let x = (ASSB = 'Y^ =l {x l — x) 2 (“sum of 
squares between 1 ') and SSW = Y2iLiY^t=i( x it ~ T,) 2 (“sum of squares within”). Let 
r(x) = SSB/SSW and g(^,T) = + (1/T). Now define w = q('ip,T)/(q('^,T) + r(x)) 

and w = g('0,T)/(g('0,T) + r(x)), where ip = oy,/^. 

We use the notation 

{ 1 if A is true 
0 if A is false 

where A is an arbitrary statement. Let B be the statement H(a s ,a fJ ,) < zf_~, 2 . The 
expected length of K (ay, ay,) is 

2z 1 . a/2 E{a £ (SSW)' 172 (w 1 / 2 I(B)+X(B c ))^ . 

Values of p close to 1 are unlikely to be encountered in practice. We therefore restrict 
attention to p G [0, jo] for some given p G (0,1). Let c min denote the coverage probability 
of K(a e , ay,), minimized over r G (—1,1), ip G (0, oo) and p G [0, jo]. 
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Figure 3: Graphs of the coverage probability functions, minimized over the non-exogeneity 
parameter r, of the confidence interval resulting from the two-stage procedure for nom¬ 
inal coverage probability 0.95. This minimum coverage is considered as a function 
of -0 = (random intercept standard deviation)/(random error standard deviation), where 
N = 100, T = 3 and p = 0.4. Two nominal levels of significance, a = 0.05 and a = 0.5, of 
the Hausman pretest are considered. 


Let 


Jc(&e) 


Pw - * _1 ((c + l)/2) (Var(/V | x )) 1/2 , p w + + l)/2) (Var (/3 W \ x )) 1/2 

Pw - $ _1 ((c + l)/2) a E / (SSW) 1 / 2 , fa + $ _1 ((c + l)/2) ^/(SSW) 1 / 2 . 


( 6 ) 


The following new theorem allows us to easily compute P(f3 e J c (<7 e )) for any given c G 

( 0 , 1 ). 


Theorem 3 P({3 G J c (<7 £ )) does not depend on any unknown parameters. 

The proof of Theorem [3] is provided in the supplementary material. Define c* to be the 
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value of c such that P{/3 6 J c (<7 £ )) = c min . Let ./ a dj(cL) be J c {cr £ ) for c = c*. In other words, 
Jadj{&e) is the confidence interval based on the fixed effects model, when a £ is estimated 
from the data, that is adjusted to have the same minimum coverage as K(a £ ,a /J ). The 
expected length of Jadj(u £ ) is 

((c* + 1) /2) E (a £ (SSW) _1/2 ) . 

The scaled expected length of K(a e , ojj is defined to be the expected length of K(a e , af) 
divided by the expected length of ./ a dj(<5y)- Let %. = T' 1 Ylt=i u u- A useful expression for 
this scaled expected length is given by the following result. 


Theorem 4 The scaled expected length of K(cr e , a f) is equal to 

E (Ei, EL(“« - “i) 2 )~ 1/2 


Zl-a/2 


1 ((c* + l)/2) 


(9,/a c ) (EiiEf,i(«i<-5.) 2 ) 


-1/2 


(7) 


A proof of this result is provided in the supplementary material. If ([T]) is less than 
1 then K[a £ , a t ,) is a shorter interval on average than J a dj(Sb). The following two new 
theorems describe important properties of the scaled expected length. Outline proofs of 
these theorems are given in Appendix C. Detailed proofs are provided in the supplementary 
material. 


Theorem 5 For (er £ ,cr M ) any of the pairs of estimators listed in Appendix B, the uncondi¬ 
tional scaled expected length given in (j7|) is determined by N (the number of individuals), 
T (the number of time points), a (the nominal significance level of the Hausman pretest), 
1 — a (the nominal coverage probability), if (the ratio a^/a £ ), p (the parameter that deter¬ 
mines G) and r (the non-exogeneity parameter). Given these quantities, the scaled expected 
length does not depend on either a') (the variance of the random error) or a 2 (the variance 
of the random effect) or o 2 x (the variance of the time-varying covariate Xu). 

Theorem 6 Suppose that N, T, a, 1 — a, if and p are fixed. When a £ and a^ are replaced 
by any of the pairs of estimators listed in Appendix B, the unconditional scaled expected 
length given in ([7]) is an even function of r 6 (—1,1). 
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Suppose that a £ and cr M are the usual unbiased estimators of cr £ and cr^, respectively. 
Consider the case that N = 100, T = 3, 1 — a = 0.95 and a G {0.05,0.50}. We find c m ; n 
by minimizing the coverage probability over r G [0,1), 0 G (0, oo) and p G [0,0.8], since it 
is reasonable to expect that p < 0.8 in practice. For given a and p, we define min SEL and 
max SEL to be the scaled expected length minimized and maximized, respectively, with 
respect to r G [0,1) and 0 G (0, oo). For each a G {0.05, 0.50} and p G {0, 0.2, 0.4,0.6, 0.8}, 
we have computed min SEL and max SEL. This information is shown in Table 1. From 
Table 1 it is clear that, for this example, -/ a dj(3y) is the shorter interval on average. The 
minimum scaled expected length is a decreasing function of p because as p increases, w 
decreases. 


a 

0.05 





0.50 





P 

0 

0.2 

0.4 

0.6 

0.8 

0 

0.2 

0.4 

0.6 

0.8 

min SEL 

4.06 

3.67 

3.23 

2.70 

1.99 

1.65 

1.57 

1.49 

1.38 

1.24 

max SEL 

4.88 

4.88 

4.87 

4.89 

4.89 

1.81 

1.81 

1.81 

1.81 

1.81 


Table 1: Scaled expected lengths, both minimized (min SEL) and maximized (max SEL) 
over r and 0, for given values of a and p. 


Remark: For the analysis of longitudinal data it is usually more appropriate to consider G 
to be the matrix where the (i, j)’th element is p^~^ (Erst order autoregression). Theorems 
mil and [6] are true for G defined in this way. Details on how to define r in this case are 
provided in the supplementary material. 

3 The coverage probability when random error and 
random effect variances are assumed known 

When <j £ and crp are known, the confidence interval resulting from the two-stage procedure is 
denoted by K(cr £ , cr M ). In this section we describe how the coverage probability of K(a £ , a^), 
conditional on x, can be computed exactly using the bivariate normal distribution. The 
results of this section are used in Section 4 to find control variates (used for variance 
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reduction) for the estimation by simulation of P(/3 G K{a e ,a^)) and the scaled expected 
length (given by (j7|)). 

Let P(/3 G K(a £ ,afj) | x) denote the coverage probability of K{a e , cr /t ), conditional on 
x. Observe that P(J3 G K(a e ,a /i ) | x) is equal to 

P(/3 e /(a £ ,a M ), H(cr £ , cqj < z\_~ /2 x) + p(/ 3 G J(cx £ ), H(a e ,a^ > z 2 _ 5/2 x) 

= P{\9l\ < ^1—a/2) |/l| < 0-5/2 | + P(|<?,| < O—a/2) A| > 0-5/2 | x) , (8) 

where 5 -/ = (A/0) - ^)/(Var o (^('0)|a;)) 1/2 , fi-j = (Av - /3)/(Var(Av|x)) 1/2 and h = 0 W ~ 
Pb)/ (Var( / d^y|x) + Varo(/ds|x)) 1//2 . By the law of total probability, (J8]) is equal to the sum 
of (1 — a) and 


P(\9i\ < O -a/2) \h\ < 0-5/2 | %) ~ P{\9j\ < O—a/2) A| < 0-5/2 | ^) • 


(9) 


The first and second terms in this expression are determined by the conditional distributions 
of the random vectors ( gi , h) and (gj, h), respectively. Theorem [7] gives these distributions, 
which requires the introduction of p 2 (x) = SSB/Var(x;), where Var(xj) is given in Appendix 
A. The following theorem is stated by Kabaila et al 2015, together with an outline proof. 
A detailed proof is proved in the supplementary material for the present paper. 

Theorem 7 Conditional on x, (gj , h) and ( gj , h) have bivariate normal distributions, 
where E(gj | x) = 0, Var(pj | x) = 1, 

T'l/jp^x) r - | _\ i T 2 ljj 2 


E(gi | x) = 


(q^,T) + q 2 ^,T)/r(x)) 


1/2 


, Var (gj \ x) = 1 


?('0, P) + q 2 {'4’i T)/r(x) ’ 


E(h | x) = , , xhAA Var(ft | *) = 1 tV 


(r(x) + g( , 0, T)) 1 / 2 ' 


r(x) + T ) ’ 


Cov(# 7 , h | x) = 


T 2 ^ 2 


(«#, T)r(x) + g 2 ('0, T)) 1/2 (l + T)/r(x)) 


T/2 


and Cov((/j, h | x) = 


(l + #,A r (x)) 


1 / 2 - 
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Thus, when a £ and a M are known, P(/3 e K(a £ , cr M ) | x) can be found easily by evaluation 
of the bivariate normal cumulative distribution function in the expression ([9]). Similarly 
to Theorem [lj this probability is determined by N, T, x, a, 1 — a, ip, p and r. Note that 
the dependence on p is through p(x). Also, similarly to Theorem i P(P e K (a £ ,a^)\x) 
is an even function of r G (—1,1). These results may be proved using similar, but much 
simpler, arguments to those used in the proofs of Theorems [Tj and [2] which are given in the 
supplementary material. 

4 Simulation methods, including the use of variance 
reduction, when the random error and random 
effect variances are unknown 

In Section 3 we described how to find the coverage probability of the confidence interval 
resulting from the two-stage procedure, conditional on x when a £ and a M are known, us¬ 
ing the bivariate normal distribution. In the practically important case that a £ and cr /t 
are replaced by estimators, the results of Section 3 allow us to find control variates (for 
variance reduction) for the estimation by simulation of both the coverage probability and 
the scaled expected length. In Section 4.1 we describe the estimation by simulation of the 
coverage probability and in Section 4.2 we describe the estimation by simulation of the 
scaled expected length. The simulation methods described in this section apply to any of 
the pairs of estimators (cr e ,cf M ) listed in Appendix B. 

We consider the model ([!]) and choose the intercept a = 0, the parameter of interest 
P = 0 and the values for N, T, a, 1 — a, a 2 , a 2 and a 2 . Of course, by Theorem [l] and 
Theorem [5} the coverage probability and the scaled expected length do not depend on either 
a, /3 or a 2 and depend on a 2 and a 2 only through ip = a lL la £ . Our simulation methods 
consist of M independent simulation runs. On the fc’th simulation run (k = 1 
we generate observations of the e^’s and (//*, Xu,..., Xir)’s using the assumptions made 
in Section 2, i.e. the e it ’s are i.i.d. N(0,cr 2 ) and the (/x*, Xu,..., £jx)’ s are hi.d. with 
a multivariate normal distribution with mean 0 and covariance matrix ([2]). As noted in 
Section 2, in simulations x is determined by u, which we write as x = x(u). Let x k and u k 
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denote the observed values of x and u, respectively, for the k 'th simulation run. 

4.1 Estimating the coverage probability by simulation 

For the observed values in the fc’th simulation run, we compute the following three quan¬ 
tities. The confidence interval resulting from the two-stage procedure, when cr £ and 
are assumed known, denoted by K k {a e , a fl ). The confidence interval resulting from the 
two-stage procedure, when a e and cr M are estimated by a £ and cx M , respectively, denoted by 
Kk(a £ ,a/j,). The coverage probability of K(a e , a^), conditional on x k , when a £ and a M are 
assumed known, is P(/3 G K(a s ,a^) | x fc ). Note that this conditional coverage probability 
is computed exactly using the bivariate normal distributions given in Theorem [7J 

Let CP = P(/3 G K(a £ ,a fJi )), the coverage probability of K(a £ , ay). Now define the 
unbiased estimator 

_ 1 M 

CP = jj G K k@ e ,ap)) 

1 k= 1 

of CP. This is the usual “brute-force” simulation estimator of CP. We estimate the variance 
of this estimator by noting that it is a binomial proportion. Let CPK = P(/3 G K(a e , cr M )), 
the coverage probability of K(a e , cr M ), when a £ and a M are assumed known. Now define the 
unbiased estimator 

1 M 

C pK =-^l(^^ ei a,)) 

1 k= 1 

of CPK. By the double expectation theorem, CPK = E x (P{f3 G K((J E , cr /t ) |x)). Thus 
another unbiased estimator of CPK = P(/3 G K(a £ , ay)) is 

_ i M 

CPK = P (P G K I xk ) ’ 

1 k= 1 

which is a much more accurate estimator of CPK than CPK. 

Define the control variate CPK — CPK, which has expected value zero. The simulation- 
based unbiased estimator of CP = P(/3 G K(d e , cx^)) that employs variance reduction using 
this control variate, is 

CP = CP - (CPK - CPk) . 

We expect that the correlation between CP and CPK will be close to 1. Since CPK is a 
much more accurate estimator of CPK than CPK, we expect that the correlation between 
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CP and the control variate CPK — CPK will also be close to 1. Note that 
1 M 

CP = ( X (£ G K kid S) d,))-l(p e K k (a £ ,a,))+P(p e K(a £ ,a^\x k )y 

1 k= 1 

We estimate the variance of this estimator by noting that it is an average of i.i.d. random 
variables. 

We evaluate the efficiency gain of using CP to estimate the coverage probability CP over 
CP, as follows. Let T and T denote the times taken to carry out M simulation runs when 
we estimate CP by CP and CP, respectively. The efficiency of the control variate estimator 
CP relative to the “brute-force” estimator CP is 

T Var(CP) 

T Var(CP)' 

The larger this relative efficiency, the greater the gain in using the control variate estimator 
CP, by comparison with using the “brute-force” estimator CP. To give an example of the 
efficiency gained by using CP compared to CP, when estimating CP, we set p = 0, N = 100, 
T = 3, r = 0, i/j = 1/3, a = 5 = 0.05 and M = 10,000. We obtain T = 179.37 seconds, 
T = 211.51 seconds, Var(CP) = 5.613591 x 1(T 6 and Var(CP) = 1.39591 x KT 6 . The 
time ratio is T/T = 0.848045 and the variance ratio is Var(CP)/Var(CP) = 4.92597, so 
the efficiency of CP relative to CP is approximately 4.17. In other words, it would take 
approximately 4.17 times as long to compute the “brute-force” estimator with the same 
accuracy as the control variate estimator. 

We also use common random numbers to create smoother plots of the estimated cover¬ 
age probability, as a function of A. The estimates of the coverage probability are computed 
for an equally-spaced grid of values of A. On the /c’th simulation run we generate an ob¬ 
servation of (/ij, Xu ,..., Xirr) from observations of the random numbers Un,..., UiT and Uj. 
So, on the k't\i simulation run, for each value of A in the grid, we use the same random 
numbers that are used to generate the observations of the e it 's and the (fJ>i,xn, ..., x^t)’s. 
These observations are then used to construct our simulation-based estimate of CP. There¬ 
fore on the /c’th simulation run, for each value of A, we have an estimate of the coverage 
probability using the same random numbers. 
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4.2 Estimating the scaled expected length by simulation 


When estimating (Jr]) by simulation we consider the expected values in the numerator and 
denominator seperately. We begin with the expected value in the numerator of (J7|). Let 
LRd = (cr E /a £ ) (w l l 2 Z(B) + Z{B C )) YlJ=i( u it — rh) 2 j and let LKj. be the value 

of LK^ when u = u k . An unbiased simulation estimator of the expected value of LK^ that 
does not use any variance reduction technique is 

_ i M 

LK t = -V LKl 

M k 

k =1 


1 /2 

Now let LKK^ = {w x ! 2 Z{C) + Z{C C )) YlJ=i( u it ~ th ) 2 j , where C is the statement 

—zi-a /2 < h < zi-a/ 2 ■ Also let LKKj. be the value of LKPC when u = u k . Then 
A(LKK t | u ) is equal to 


N T 


- 1/2 


E (w 1 / 2 X(C) + X(C C ) | u) 


,i= 1 t=l 


N T 


- 1/2 


(w 1/2 P(C I x) + P{C C I x)) S - 


uZ 2 


,i=l t=1 


where x = x(u). We compute P{C \ x ) using Theorem [7j An unbiased simulation estimator 
of the expected value of LK^ that makes use of a control variate for variance reduction is 



1 

M 


E ( LK ! 


^LKKj, — E (LKK t | . 


Next we will consider the expected value in the denominator of Let LJ f = 

(a £ /< 7 e ) (^2iL\Ylt=i( u it — Ui ) 2 j and let Ljj, be the value of LJ^ when u = u k . An 

unbiased simulation estimator of the expected value of LJ^ that does not use any variance 
reduction technique is 

_ i M 

lj, = mE l 4- 

k =1 
X/2 

Now let LJK^ = (^2^ = iYlt=i( u it ~~ u ^ 2 j and let LJKj, be the value of LJK^ when 

u = u k . Note that Ylf=i Y^t=i( u n ~ th ) 2 ~ Xn(t- i)- ^ follows from this that E (LJK^) = 
2~!/ 2 ('p ((N(T — 1) — 1) /2) /T (( N(T — 1)) /2)). An unbiased simulation estimator of the 
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expected value of LJ^ that makes use of a control variate for variance reduction is 



1 

M 



f LJK^ - 2 -1 / 2 


r((V(T-l)-l)/ 2 ) \\ 
r((JV(T-l))/2) ))■ 


Therefore, a simulation estimator of the scaled expected length that makes use of control 
variates for variance reduction is 

Zl-a/2 LK 

4>- 1 ((c-+ l)/2) ' 

We estimate the variance of LK^, LK^, LJ^ and LJ^ by noting that each estimator is 
an average of i.i.d. random variables. Using R we can assess the relative efficiency of 
the estimators that use control variates for variance reduction with their “brute-force” 
counterparts using a similar method to that discussed in Section 4.1. For example, for 
known quantities N = 100, T = 3, a = 0.05 and unknown quantities t — 0, p — 0, cr e = 1, 
cr /t = 1 and a x — 1, and for M = 1000 independent simulation runs, we ford that the 
efficiency of LK^ relative to LK^ is approximately 1 and the efficiency of LJ^ relative to 
LJ^ is approximately 2. However, if we increase p to 0.8 (leaving the other parameters 
unchanged), the efficiency of LK^ relative to LK^ increases to approximately 3. Hence, 
estimating the scaled expected length using control variates is well justified. 

Common random numbers are also used for the estimation by simulation of the scaled 
expected length. This is done in a similar way to the method described at the end of 
Section 4.1. 

Remark: In this paper we consider the two-stage procedure when the inference of interest is 
a confidence interval for the slope parameter. As one would expect from the duality between 
hypothesis tests and confidence intervals, our results have important implications when the 
subsequent inference is a hypothesis test for the slope parameter. These implications are 
described in detail in the supplementary material. 


5 Conclusion 

The Hausman pretest is an example of preliminary statistical (i.e. data based) model selec¬ 
tion. Other examples include model selection by minimizing a criterion such as the Akaike 
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Information Criterion or the Bayesian Information Criterion. The effects of preliminary 
statistical model selection on confidence intervals can range from the benign to the very 
harmful, depending on the class of models under consideration, the known aspects of the 
model, the parameter of interest and the model selection procedure employed (Kabaila, 
2009 and Kabaila and Leeb, 2006). In other words, each case needs to be considered indi¬ 
vidually on its merits. Our results show that for the small levels of significance (such as 
5% or 1%) of the Hausman pretest commonly used in applications, the minimum coverage 
probability of the confidence interval for the slope parameter with nominal coverage prob¬ 
ability 1 — a can be far below nominal. The methodology that we have described makes 
it easy to assess, for a wide variety of circumstances, the effect of the Hausman pretest 
on the minimum coverage probability of this confidence interval. An interesting finding 
is that if we increase the significance level of the Hausman pretest to, say, 50% then this 
minimum coverage probability is much closer to the nominal coverage 1 — a for a wide range 
of parameters. However, regardless of the level of significance of the Hausman pretest, this 
interval is wider on average than the interval based on the fixed effects model which is 
adjusted to have the same minimum coverage. Therefore, the Hausman prestest should 
not be used in practice to choose between the random effects model and fixed effects model 
for clustered or longitudinal data when the subsequent inference of interest is a confidence 
interval for the slope parameter. 


Appendix A. Definition of the non-exogeneity 
parameter r 

It may be shown that the distribution of /q conditional on (xn ,..., x, l t) is normal with 
mean 

cr^rT 

(1 + (T - l)p) <7* Xiy 
where Xi = T~ 1 Y^t=i x iti an d variance 

2 A _ T 2 T \ 

"V 1 + CT-iW' 

This suggests that r = Corris a reasonable measure of the dependence between /q 
and (xn,... ,Xit) he. that it is reasonable to designate r as the non-exogeneity parameter. 


21 



It may be shown that Var (xi) = a 2 { 1 + (T — 1 )p)/T and Cov(/ij,£j) = Ta^a x . Thus 


t = T 


T 


1 + (T - 1 )p 


1/2 


Appendix B. Description of the estimators of the 
random error and random effect variances 


It has been suggested in the literature (see e.g. Hsiao, 1986 and Baltagi, 2005) that if a 
negative estimate of variance is observed then one should do as Maddala and Mount (1973) 
suggest and replace this negative estimate by 0. We use this kind of approach to ensure 
that a 2 is always positive and a 2 is always nonnegative. This ensures that the proofs of 
Theorems BUS and [6] carry through for each of the three pairs of estimators that we 
consider in this paper. We consider the following pairs of estimators of a 2 and a 2 : 

(1) The usual unbiased estimators. Define 


N 


%= 


EtwtEE 


N(T- 1) - 1 


i= 1 t =1 


and a 2 = max(0, a 2 ), where 


~2 _ 

M N 


N 

E 


N 


rf 


2 ^ ' 1 NT(T — 1) — T 

i= i y ’ i= 1 t= 1 


EE 


The ruS are the OLS residuals from model Q and the r*’s are the OLS residuals from 
model pi). Note that a 2 is an unbiased estimator of a 2 only for r = 0. 


(2) Hsiao’s (1986) maximum likelihood estimators a 2 and a 2 . We assume, of course, that 
the maximum likelihood estimator is obtained by maximizing the log-likelihood function 
subject to the parameter constraints a 2 > 0 and a 2 > 0. 

(3) Wooldridge’s (2002) estimators. Define a 2 = max(—ecif, a 2 ) where e is a very small 
positive number and 




1 

NT -K 


N T 




i 

NT{T - l)/2 - K 


N T—l T 


EE E 

i =1 t =1 s=t+l 


Also define a 2 


max(0, a 2 ) where 


a „ 


1 

NT(T — l)/2 — K 


N T—l T 


E E E ^ • 

i =1 t =1 s=t+l 
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Here, the r lt 's are the residuals from pooled OLS estimation for the model (JT|) and K = 0 
(no d.o.f. correction) or K = 2 (d.o.f. correction). 


Appendix C. Outline proofs of Theorems [5] and [6] 


Outline proofs of Theorems [lj [2] and [7] have been provided by Kabaila et ah, 2015. Here we 
provide outline proofs of Theorems [5] and [ 6 ] Detailed proofs of Theorems [l]-[7] can be found 
in the supplementary material. Theorems [5] and [ 6 ] hold for the three test statistics given 
by Hausman and Taylor (1981) and the three pairs of estimators described in Appendix B. 
For the sake of brevity, the outline proofs of these results are given only for the Hausman 
test statistic H(a £ ,a fJt ) and the unbiased estimators of oy and described in Appendix B. 


Proof of Theorem |5] 

Let h denote the statistic h, when ay and are replaced by the unbiased estimators d £ 
and cr M , respectively. Then B is equivalent to the statement that —£ 1 - 5/2 < h < £i_ 5 / 2 . 
Let x\ t = Xit/c t x , e\ t = £it/o £ and p\ = A^/o^. The joint distribution of the ej t ’s and the 
(/Tj, xn ,..., XitY s is determined by p and r. Now express h and oy/oy in terms of the 
e| t ’s, /i|’s and tp. Since c min and the joint distribution of the u it 's do not depend on any 
parameters, it follows that ([7]) is determined by the known quantities N, T, 1 — a and 5, 
and unknown parameters tp, p and r. 


Proof of Theorem |6] 

Assume that (al, Xu, ..., x^t) has a multivariate normal distribution with mean 0 and 
covariance matrix (|2j). Then r = r ((1 + (T — l)p)/T)^ 2 . By Theorem 0 the scaled 
expected length is a function of r. Let x* t = —Xu for i = 1,..., N and t = 1,... ,T. For 
r = d, (p,i, Xu ,..., Xtr) has a multivariate normal distribution with mean 0 and covariance 
matrix Q, where r = d ((1 + (T — 1 )p)/T) 1 ^ 2 . For r = —d, (/ij, x* 1; ..., x * T ) has the same 
distribution. It can be shown that the scaled expected length is the same function of the 
x*Ps, £it s and pPs as it is of the x^s, £us and pPs. Hence the scaled expected length is 
an even function of r. 
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SUPPLEMENTARY MATERIAL 


Supplementary Appendix: A remark on the two-stage procedure when the inference of 
interest is a hypothesis test and a remark on the choice of the scaling A = N 1 ^ 2 r. 
This supplementary appendix also includes detailed proofs of Theorems 1-7 and the 
definition of the non-exogeneity parameter r in the case of first order autoregression. 
(PDF hie) 
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